Documents to Data (Text Processing)
Synopsis
Generates a data set from documents.Description
This operator generates a data set from a collection of documents. For each document in the collection, an example is added to the data set. The text contained in the document is stored in a nominal attribute. If a label or meta data are present associated with the documents, a label attribute or attribute for the meta data are created, respectively.
Input
- documents (Collection)
The documents port.
Output
- example set (Data table)
The example set port.
Parameters
- text attributeThe name of the text attribute.
- label attributeThe name of the label attribute.
- add meta informationIf checked, available meta information of the text like filename, date is added as attribute.
- datamanagementDetermines, how the data is represented internally.